This file contains the visualizations of the simulation study for the Interval Truth Model. The results of the preliminary simulation study on the two different link functions can be found in the file src/01_simulation_study_link_functions_visualizations.Qmd.
All errorbars in this document represent \(\pm 1\) Monte Carlo Standard Errors.
As the consensus locations and width have slightly different standard deviations, we can use these to standardize performance measures for ease of interpretability. Obtain the SDs from the data-generation function:
Show the code
link <-"ilr"mean_benchmark <-simplex_to_bvn(c(.4, .2, .4), type = link)sd_benchmark_loc <-simplex_to_bvn(c(.98, .01, .01), type = link)sd_benchmark_wid <-simplex_to_bvn(c(.495, .01, .495), type = link)# mean for Tr_locmu_Tr_loc <- mean_benchmark[1]# mean for Tr_widmu_Tr_wid <- mean_benchmark[2]# SD for Tr_locsigma_Tr_loc <- sd_benchmark_loc[1] /4# SD for Tr_widsigma_Tr_wid <-abs(sd_benchmark_wid[2] - mean_benchmark[2]) /4# SD for the mean of Tr_loc and Tr_widsigma_Tr_interval <-sqrt(sigma_Tr_loc^2+ sigma_Tr_wid^2) /2# SD for a_loc on the log-scalesigma_a_loc <- .3# SD for b_locsigma_b_loc <- sigma_Tr_loc /3# SD for b_widsigma_b_wid <- sigma_Tr_wid /3# SD for lambda_loc and E_loc on the log-scalesigma_lambda_E_loc <- .3# SD for lambda_wid and E_wid on the log-scale sigma_lambda_E_wid <- .3
3 Deviations from Preregistration
Deviation
Explanation
We used rstan instead of cmdstanr.
We used rstan due to its ease of installation and use for applied users, and its useful helper functions. This should not influence the quality of our results.
We did not use multivariate priors for proficiency and discernibility parameters.
TODO give reasoning.
We set \(\omega_j = 0\) in data generation.
We simply forgot to mention this in the preregistration. TODO @Matze, is this correct?
Changed formula for absolute bias and MSE, with denominator changing from number of participants to the number of items.
We used 500 instead of 1,000 simulation repetitions
We underestimated the computational workload, but checked the uncertainty of point estimates and realized that it was so small that 500 repetitions sufficed, which is in line with our preregistered criterion of a certain maximum MCSE.
We changed the R version.
During the revision, we used the newest R version. This should not change the results.
We added additional benchmarks (i.e., simple medians).
We were asked to do so during the revision, and added it because it provides an even stronger test of our model.
We added the Pearson correlation between true and estimated parameter values as an additional performance measure.
We were asked to include the performance/recovery of the secondary parameter besides the consensus intervals. The correlation facilitates the interpretation of results regarding recovery.
Additionally, as we already explained in the preregistration, we did not preregister the full model setup including the priors, which left us with some freedom to choose the priors. We make these choices transparent in the manuscript.
4 Results
The full results of the simulation study are available in two data frames:
sim_res_itm_server.rds: Results of the simulation study, missing some MSE results due to a small coding error, but containing all information about seed, RAM usage, etc.
sim_res_itm_0808.rds: Resummarized results of the simulation study, now also containing all MSE results as well as more information on divergent transitions.
Prepare data and convert to long format for plotting:
Show the code
sim_res_itm_long <- sim_res_itm |>select(!c(contains("conv"), contains("divtrans"), contains("_sd"))) |># the following columns aren't needed any more for the "resummarized" results# "REPLICATIONS", "SIM_TIME", "RAM_USED", "SEED", "COMPLETED", "WARNINGS")) |># delete "_fn_" from every column namerename_all( ~str_remove(., "_fn")) |>pivot_longer(cols =!c(n_respondents, n_items),names_to ="measure",values_to ="value" ) |># only look at non-log transformed valuesfilter(!grepl("log", measure)) |># rename for easier separationmutate(measure =str_replace(measure, "abs_bias", "absbias")) |># add string so that each measure has same number of underscoresmutate(measure =str_replace(measure, "omega", "omega_cor")) |># remove only the first underscoremutate(measure =sub("_", "", measure, fixed =TRUE)) |>separate_wider_delim(measure,names =c("measure", "summary", "pm", "param"),delim ="_") |>group_by(n_respondents, n_items, measure, summary, pm) |>pivot_wider(names_from ="param", values_from ="value") |>ungroup() |>mutate(n_respondents =factor(n_respondents))
4.1 Convergence and Rhat
We now present the average convergence statistics. All values are averages over all replications. A performance measure with a \(>\) or a \(<\) indicates how many out of 1000 replications had a certain property. For example, \(\hat{R} < 1.1\) means that on average, a certain amount of replications had an \(\hat{R}\) value below 1.1.
Show the code
# Renaming for the tablename_mapping <-c("rhat_mean"="$\\hat{R}_{\\text{mean}}$","rhat_prop1c1"="$\\hat{R} < 1.1$","rhat_prop1c05"="$\\hat{R} < 1.05$","rhat_prop1c01"="$\\hat{R} < 1.01$","ess_bulk_mean"="$\\text{ESS}_{\\text{bulk, mean}}$","ess_bulk_prop100"="$\\text{ESS}_{\\text{bulk > 100}}$","ess_bulk_prop200"="$\\text{ESS}_{\\text{bulk > 200}}$","ess_bulk_prop300"="$\\text{ESS}_{\\text{bulk > 300}}$","ess_bulk_prop400"="$\\text{ESS}_{\\text{bulk > 400}}$","ess_tail_mean"="$\\text{ESS}_{\\text{tail, mean}}$","ess_tail_prop100"="$\\text{ESS}_{\\text{tail > 100}}$","ess_tail_prop200"="$\\text{ESS}_{\\text{tail > 200}}$","ess_tail_prop300"="$\\text{ESS}_{\\text{tail > 300}}$","ess_tail_prop400"="$\\text{ESS}_{\\text{tail > 400}}$")sim_res_itm |>select(contains("conv")) |>summarize(across(everything(), mean)) |># remove "mean_conv." from every column namerename_all(~str_remove(., "mean_conv.")) |>rename_all(~str_remove(., "mcse_conv.")) |>pivot_longer(cols =everything()) |># separate mean and mcse based on last underscoreseparate(name, into =c("name", "suffix"), sep ="_(?=[^_]+$)", remove =FALSE) |>pivot_wider(names_from = suffix, values_from = value) |>mutate(mean =round(mean, 4), mcse =round(mcse, 4)) |>mutate(name = name_mapping[name]) |> knitr::kable()
name
mean
mcse
\(\hat{R}_{\text{mean}}\)
1.0011
0.0001
\(\hat{R} < 1.1\)
0.9998
0.0003
\(\hat{R} < 1.05\)
0.9997
0.0003
\(\hat{R} < 1.01\)
0.9989
0.0012
\(\text{ESS}_{\text{bulk, mean}}\)
3687.8952
23.7872
\(\text{ESS}_{\text{bulk > 100}}\)
0.9998
0.0002
\(\text{ESS}_{\text{bulk > 200}}\)
0.9998
0.0003
\(\text{ESS}_{\text{bulk > 300}}\)
0.9997
0.0004
\(\text{ESS}_{\text{bulk > 400}}\)
0.9996
0.0005
\(\text{ESS}_{\text{tail, mean}}\)
2837.8860
4.9501
\(\text{ESS}_{\text{tail > 100}}\)
1.0000
0.0001
\(\text{ESS}_{\text{tail > 200}}\)
0.9999
0.0002
\(\text{ESS}_{\text{tail > 300}}\)
0.9999
0.0002
\(\text{ESS}_{\text{tail > 400}}\)
0.9999
0.0003
The divergent transitions are shown in the table below. For purposes of readability, we omit the MCSEs here.
Apply a function that retrieves location and width estimates to the results:
Show the code
if(exists(here("sim_results","sim_results_itm_2025-04-28_03-30-27","locwid_bias.rds"))) {# read the data locwid_bias <-readRDS(here("sim_results","sim_results_itm_2025-04-28_03-30-27","locwid_bias.rds" ))} else{ locwid_bias <-prep_locwid(path_full_results)# save the datasaveRDS( locwid_bias,here("sim_results","sim_results_itm_2025-04-28_03-30-27","locwid_bias.rds" ) )}
Then create a scatter plot to investigate compensatory behavior, in other words, if the bias of the location is higher when the bias of the width is lower and vice versa. Overall, there does not seem to be strong evidence for compensatory behavior.
Show the code
# standardize the valueslocwid_bias <- locwid_bias |> dplyr::mutate(loc_bias_std = loc_bias / sigma_Tr_loc,wid_bias_std = wid_bias / sigma_Tr_wid)# if fonts aren't working properly with ggExtra# extrafont::font_import()# # add google fontsysfonts::font_add_google("News Cycle", "news")# use showtextshowtext::showtext_auto()# windows()# extrafont::font_import(pattern = "NewsCycle-Regular.ttf")# extrafont::loadfonts()scatter_tmp <- locwid_bias |>ggplot(aes(x = loc_bias_std, y = wid_bias_std)) +geom_point(alpha =0.5) +geom_abline(intercept =0,slope =1,linetype ="dashed") +labs(x ="Location Bias (standardized)", y ="Width Bias (standardized)") +geom_smooth(method ="loess", se =TRUE) +scale_x_continuous(limits =c(0, .85), expand =c(0, 0)) +scale_y_continuous(limits =c(0, .85), expand =c(0, 0)) +# manually set theme here due to font conflicts ggplot2::theme_minimal(base_family ="news") + ggplot2::theme(# remove minor gridpanel.grid.minor = ggplot2::element_blank(),# Title and Axis Textsplot.title = ggplot2::element_text(face ="plain",size = ggplot2::rel(1.2),hjust =0.5 ),plot.subtitle = ggplot2::element_text(size = ggplot2::rel(1.1), hjust =0.5),axis.text.x = ggplot2::element_text(face ="plain", size = ggplot2::rel(1.05)),axis.text.y = ggplot2::element_text(face ="plain", size = ggplot2::rel(1.05)),axis.title.x = ggplot2::element_text(face ="plain", size = ggplot2::rel(1.3)),axis.title.y = ggplot2::element_text(face ="plain", size = ggplot2::rel(1.3)),axis.line =element_line(colour ="#6d6d6e"),# Facetingstrip.text = ggplot2::element_text(face ="plain",size = ggplot2::rel(1.1),hjust =0.5 ),strip.text.x.top = ggplot2::element_text(face ="plain",size = ggplot2::rel(1.2),hjust =0.5 ),# strip.text.y = element_blank(),strip.background = ggplot2::element_rect(fill =NA, color =NA),# Gridpanel.grid = ggplot2::element_line(colour ="#F3F4F5"),# Legendlegend.title = ggplot2::element_text(face ="plain"),legend.position ="top",legend.justification =1,# Panel/Facetspanel.spacing.x = ggplot2::unit(1.6, "lines"),panel.spacing.y = ggplot2::unit(1.6, "lines"),# Remove vertical grid linespanel.grid.major.x = ggplot2::element_blank() ) +theme(text =element_text(size =28))# show marginal distributions as histograms# doesn't work with custom font# scatter_locwid <- ggExtra::ggMarginal(scatter_tmp, type = "histogram", bins = 50)+# theme_minimal(base_family = "news")ggsave(here("plots","sim_main", "scatter_locwid.pdf"), scatter_tmp, width =7, height =5)scatter_tmp
4.2.4 Visualization of Raw Results
Again, apply a function that retrieves location and width estimates to the results:
Show the code
if(exists(here("sim_results","sim_results_itm_2025-04-28_03-30-27","raw_absbias.rds"))) {# read the data path_full_results <-here("sim_results","sim_results_itm_2025-04-28_03-30-27","raw_absbias.rds")} else{# function to obtain the raw results raw_absbias <-prep_locwid(path_full_results,simplemeans =TRUE,simplemedians =TRUE)# save the datasaveRDS( raw_absbias,here("sim_results","sim_results_itm_2025-04-28_03-30-27","raw_absbias.rds" ) )}
Instead of only showing aggregate performance measures, we can also visualize the raw repetition-wise performance measures to investigate the variability and potential skewness of the performance.